Introduction

In this report, I will combine different scenario’s that are tested to calculate the emperical coverage of the confidence intervals around weighted averages in a meta-analysis. The scenario are:

  1. CI around weighted average from standardized mean-difference in two-sample test.
  2. CI around weighted average from standardized mean-difference in one-sample test.
  3. CI around weighted average from transforming t-value in linear regression.
  4. CI around weighted average from a univariate fMRI voxel.
  5. CI around weighted average from a 16x16x16 brain.
  6. CI around weighted average in a univariate fMRI voxel with increasing sample size and studies.
  7. CI around weighted average in a multivariate fMRI setting with increasing sample size and studies.
  8. CI around weighted average in simple block design 2 x 2 x 2 multivariate fMRI setting.
  9. CI around weighted average in simple block design 16 x 16 x 16 multivariate fMRI setting.

We will consider 3 types of CI:

  1. The z distribution CI
  2. The t distribution CI
  3. The weighted variance CI

Mathematical definition

Let us have \(k\) studies with \(\hat{\theta_i}\) the estimated effect size in study \(i\). Consider for a meta-analysis the estimator for a weighted average (\(\mu\)) as:

\[ \hat{\mu} = \frac{\sum_i\hat{w_i}\hat{\theta}_i}{\sum_i \hat{w_i}}\] with \(\hat{w_i}\) the estimated optimal weights (either fixed or random effects meta-analysis).
The sampling variance of \(\hat{\mu}\) is given by: \[ V(\hat{\mu}) =\frac{1}{\sum_i\hat{w_i}}\]



\(z\) distribution CI

The z distribution based CI around the weighted average effect size is defined as: \[ \hat{\mu} \pm z_{1-\alpha/2}\sqrt{\hat{V}(\hat{\mu})} \]

\(t\) distribution CI

The t distribution CI with \(k-1\) degrees of freedom is calculated as: \[ \hat{\mu} \pm t_{k-1,1-\alpha/2}\sqrt{\hat{V}(\hat{\mu})} \]

weighted variance CI

The weighted variance CI assumes a Student’s t distribution with \(k-1\) degrees of freedom. The sampling variance of \(\hat{\mu}\) is extended by applying the following weighting scheme: \[ \hat{V_w}(\hat{\mu}) = \frac{\sum_i\hat{w_i}(\hat{\theta_i} - \hat{\mu})^2}{(k - 1)\sum_i\hat{w_i}} \]

The CI around the weighted average effect size can then be calculated as: \[ \hat{\mu} \pm t_{k-1,1-\alpha/2}\sqrt{\hat{V_w}(\hat{\mu})} \]



Scenario 1: standardized mean-difference in two-sample test

We start with simulating two groups \(X_1\) and \(X_2\) that do not differ from each other. There are 15 subjects in each group. Each meta-analysis contains 20 studies. In each simulation, we first calculate the effect size and its variance as following:

# True value
trueVal <- 0
# Subjects in first group
X1 <- trueVal + rnorm(n=N1,mean=0,sd=2)
S2_X1 <- var(X1)
# Second group
X2 <- trueVal + rnorm(n=N2,mean=0,sd=2)
S2_X2 <- var(X2)
# Sample variance
Swithin <- sqrt((((N1-1)*S2_X1)+((N2-1)*S2_X2))/(N1 + N2 - 2))
# Cohen d
d <- (mean(X1) - mean(X2))/Swithin
# Variance
Vd <- (N1+N2)/(N1*N2) + (d^2/(2*(N1+N2)))

Which we will loop through until we have 20 studies. The values d and Vd are stored in a vector. The meta-analysis part, then consists of:

# Weights
W <- 1/Vd
# Weighted mean
wMean <- sum(W*d)/sum(W)
# Variance of summary effect
varSum <- 1/sum(W)
# Confidence intervals
CI.upper.norm[i] <- wMean + (1.96 * sqrt(varSum))
CI.lower.norm[i] <- wMean - (1.96 * sqrt(varSum))

CI.upper.t[i] <- wMean + (qt(0.975,df=nstud-1) * sqrt(varSum))
CI.lower.t[i] <- wMean - (qt(0.975,df=nstud-1) * sqrt(varSum))

CI.weightedVariance <- sum(W*(dStud - wMean)**2)/((nstud - 1) * sum(W))
CI.upper.weightVar[i] <- wMean + (qt(0.975,df=nstud-1) * sqrt(CI.weightedVariance))
CI.lower.weightVar[i] <- wMean - (qt(0.975,df=nstud-1) * sqrt(CI.weightedVariance))

Finally, we calculate the coverage over all simulations (5000) as:

# Coverage
coverage.norm[i] <- ifelse(trueVal >= CI.lower.norm[i] & trueVal <= CI.upper.norm[i], 1, 0)
coverage.t[i] <- ifelse(trueVal >= CI.lower.t[i] & trueVal <= CI.upper.t[i], 1, 0)
coverage.weightVar[i] <- ifelse(trueVal >= CI.lower.weightVar[i] & trueVal <= CI.upper.weightVar[i], 1, 0)

The mean coverages are in this case:

mean(coverage.norm); mean(coverage.t); mean(coverage.weightVar)
## [1] 0.9504
## [1] 0.9598
## [1] 0.9476

Scenario 2: standardized mean-difference in one-sample test

In the second scenario, the effect size in each study is calculated based on one sample of subjects in which the null hypothesis statest that the mean is equal to 0.
The effect size (Hedges’ \(g\)) and its variance are calculated as: \[ \begin{align} J &= 1-\left(\frac{3}{4(N-1)-1}\right) \\ g &= \frac{t}{\sqrt{N}} \times J \\ V(g) &= \frac{1}{N} + (1 - \frac{\gamma[(N - 2 / 2)]}{\gamma[(N - 1)/2]^2} \times \frac{N - 3}{2} \times g^2 \end{align}\]

We now plan 50.000 simulations in which there are each time 80 subjects and 50 studies for the meta-analysis.
We proceed again by simulating the data for the individual subjects, calculate a \(T\)-value and convert them to an effect size with its variance. These latter two values are stored in a vector.

nsub <- 80
# Subjects in group
X1 <- trueVal + rnorm(n=nsub,mean=0,sd=2)
S2_X1 <- var(X1)

# Hedge g
t <- (mean(X1))/sqrt(S2_X1/nsub)
g <- hedgeG(t,nsub)
Vg <- varHedge(g,nsub)

After 50 iterations (the number of studies), we perform a fixed effects meta-analysis and calculate the confidence intervals as in scenario 1 (same code, not shown here).

The mean values for the three types of confidence intervals are:

mean.coverage.norm;mean.coverage.t;mean.coverage.weightVar
## [1] 0.95294
## [1] 0.95612
## [1] 0.9489

To see if the amount of simulations is sufficient, we can make a plot over the amount of simulations run.
We start by taking the first 600 simulations (burn-in) and calculate the mean coverage over the 3 types of CI. We then add 10 simulations and calculate the mean coverage over the 610 simulations. We continue untill all simulations are added. This produces the next plot (code not shown):

We conclude that we have enough simulations. Though, there seem to be a small bias. Possibly, it would help to increase the amount of subjects and studies even more.

Scenario 3: transforming t-value in linear regression

After looking at the basics, we now start with linear regression to move further to fMRI. We add this scenario, because we only look at 15 subjects and 15 studies. These are the parameters used in simulating fMRI data. In this fictional example, we try to predict IQ of subjects by measuring their weight. We use 10.000 simulations. Note however that we are using formulas that are designed to be used with a one-sample t-test! In each simulation, we first simulate 15 subjects into 15 studies as:

nsub <- 15
# Weight of subjects
X <- 60 + rnorm(n=nsub,mean=0,sd=5)
# IQ
Y <- 100 + rnorm(n=nsub,mean=0,sd=5)

# Linear regression
fit <- lm(Y~X)

# T-value
TVal <- summary(fit)$coefficients[2,3]

# Hedge g
g <- hedgeG(TVal,nsub)
Vg <- varHedge(g,nsub)

We again save the values \(g\) and \(Vg\) into a vector and proceed as previously to calculate the empirical coverages of the CI’s.

The results are:

mean.coverage.norm;mean.coverage.t;mean.coverage.weightVar
## [1] 0.967
## [1] 0.9812
## [1] 0.9611

The amount of bias is larger than previous scenario. Possibly, the amount of subjects and studies is too low.

We can again see if the amount of simulations are sufficient:

Scenario 4: univariate fMRI time series (one voxel)

We finally get to simulate fMRI time series. In this scenario, we only simulate one voxel. As a consequence, we cannot do fancy stuff such as spatial smoothing, or even adding spatial noise. Hence, the simulations are limited to a time series with white noise only.

Some (incomplete) code for simulating the fMRI time series is shown below:

# Design of the data (no effect)
design.null <- simprepTemporal(regions = 1, onsets = onsets, durations = duration,
                         hrf = "double-gamma", TR = TR, totaltime = total,
                         effectsize = effect.null)
# Define region
regions <- simprepSpatial(regions = 1, coord = coordinates, radius = list(radius), form ="cube", fading = 0)
# Actual simulated data
sim.data <- simVOLfmri(design=design.null, image=regions, base=base, dim=DIM, SNR=0.5,
             type ="gaussian", noise= "mixture", spat="gaussRF", FWHM=2, weights=w, verbose = TRUE)

The time series for a particular simulation in this one voxel is depicted here:

In order to analyze the data, we use a simple GLM with the design matrix (\(x\)) consisting of 2 columns with the convoluted time series. This is depicted in the figure below (red is first column, blue is the second).

The following code shows fitting the design matrix to the simulated data, extract the beta coefficients and calculate the contrast as \(1 - 1\):

LM.sim.data <- array(sim.data,dim=nscan)
fit <- lm(LM.sim.data~x[,-3])
    b1 <- summary(fit)$coefficients[2]
    b2 <- summary(fit)$coefficients[3]
BETAS <- c(b1,b2)
CONTRAST <- c(1,-1)
  # Estimated contrast of parameter beta's
COPE.sub <- CONTRAST %*% BETAS
  COPE[s] <- COPE.sub

The COPE values are saved into a vector, after which we move on to the second stage GLM of the fMRI in which we use a simple OLS estimation procedure:

# Group COPE (average)
GCOPE <- mean(COPE,na.rm=TRUE)
# Now we will do the OLS estimation of the variance
GVARCOPE <- var(COPE,na.rm=TRUE)
# TMAP
GTMAP <- GCOPE/sqrt(GVARCOPE/(nsub))

We proceed with the meta-analysis as in previous scenario.

The results are shown below:

##     Mean        CI
## 1 0.9683      norm
## 2 0.9833         t
## 3 0.9533 weightAvg

We can also look at the distribution of the weighted average values from this one voxel over all simulations.
This is plotted below:

Scenario 5: multivariate fMRI time series (16 x 16 x 16)

In this section, we look at a small grid of voxels. We consider a 16 x 16 x 16 image. The basic parameters are the same as in previous scenario. We only look at 15 subjects in 15 studies, over 3000 simulations. However, we now have the oportunity to introduce extra noise such as spatial noise between neighbouring voxels. As a consequence, we investigate the effect of an increasingly deviation from white noise only. We simulate 7 scenario’s which are listed below. From scenario 3–7, we list the true weighting in the following order: white, temporal, low-frequency, physyiological, task related and spatial. This is the relative weighting given to the type of noise.

Note that we induce between-subject variability on this weighting structure when simulating data. Furthermore, we add two locations which could contain a true signal. However, as we are simulating null data, this is not relevant (only included for later on when we add true activation).

The analysis of the linear model is now performed using a function from the fmri package. This is done because it allows to estimate temporal correlation (the AR(1) coefficient), pre-whiten the data and directly extract the contrast from the fmri.lm function. In R code, this is:

# Fitting GLM model: estimated AR(1)-coefficients are used to whiten data, may produce warnings because data is pre-smoothed.
model <- fmri.lm(datafmri,x, actype = "accalc", keep="all",contrast=c(1,-1))

# Estimated contrast of parameter beta's from model
COPE.sub <- model$cbeta
  COPE[,,,s] <- COPE.sub
VARCOPE.sub <- model$var

The rest of the procedure is similar to previous scenario (calculating group T-maps, converting the studies to an effect size and perform a fixed effects meta-analysis). We end up with 3000 times a grid of 16 x 16 x 16 voxels. In each voxel, we can look whether or not the true value of the weighted average (0) is in the CI.

The emperical coverages of the CI’s is calculated by averaging over all 3000 simulations AND all voxels.

That is, for \(i\) voxels containing an indicator value (\(V = 1\), the CI contains the true value) and \(s\) simulations:

\[ \text{coverage} = \frac{1}{4096}\sum_{i=1}^{4096}\left(\frac{1}{3000} \sum_{s=1}^{3000}V_{is} \right) \]

The following three figures show the emperical coverage, the distribution of weighted averages in one voxel over all simulations and the distribution of all voxels over all simulations. The SD bars in the first plot show the variability between all voxels of the grid over all simulations.

Finally, we check whether we have enough simulations for each of the 7 scenario’s:

Scenario 6: univariate fMRI time series for increasing sample and study size

As seen in previous scenario, the coverages do not particularly start at nominal level. However, as suggested in scenario 2, this could be due to a limited sample size and amount of studies. To this extent, we simulate for the univariate case 1,500 simulations in which we vary the amount of subjects from 20 to 100 in steps of 10, while increasing the amount of studies from 2 to 10 in steps of 1. Note however that we only look at balanced designs (equal amount of subjects in all the studies) meaning we leave out some scenarios. An overview is given below:

##    Subjects Studies
## 1        10       2
## 2        20       2
## 3        30       2
## 4        40       2
## 5        50       2
## 6        60       2
## 7        70       2
## 8        80       2
## 9        90       2
## 10      100       2
## 13       30       3
## 16       60       3
## 19       90       3
## 22       20       4
## 24       40       4
## 26       60       4
## 28       80       4
## 30      100       4
## 31       10       5
## 32       20       5
## 33       30       5
## 34       40       5
## 35       50       5
## 36       60       5
## 37       70       5
## 38       80       5
## 39       90       5
## 40      100       5
## 43       30       6
## 46       60       6
## 49       90       6
## 57       70       7
## 64       40       8
## 68       80       8
## 79       90       9
## 81       10      10
## 82       20      10
## 83       30      10
## 84       40      10
## 85       50      10
## 86       60      10
## 87       70      10
## 88       80      10
## 89       90      10
## 90      100      10

The results of the emperical coverage for this one voxel is plotted in the graph below. However, for plotting purpose, I only show the results for 2, 5 and 10 studies (all going from 20 to 100 sample size).

In the next figure, I have caluclated a linear regression model with sample size and amount of studies predicting the emperical coverage. The parameter coefficients are then used to predict all the combinations of sample size and study. This is plotted in the next graph. Be cautious however as these are predictions, not the exact values. Furthermore, 3D graphs can be notoriously misleading!

With increasing the sample size and the amount of studies, we can see how the emperical coverage approaches the nominal level. We will now check this in the multivariate approach.

Scenario 7: multivariate fMRI time series for increasing sample and study size

In this scenario, we repeat the simulation setting from scenario 5 with the details from scenario 6. That is, increasing the sample size from 20 to 100 in steps of 10 with the amount of studies from 2 to 10 whilst looking at the multivariate 16 x 16 x 16 grid of voxels in an fMRI setting. We only have 350 simulations at this time.

The emperical coverages are plotted below:

There is definitely something going wrong. However, at this moment I have no idea what exactly causes these weird coverages.

If we just look at one voxel, number 729 which is equal to position x=9, y=9, z=9.
Then we take the scenario in which we have 2 studies and N = 30 and compare this with 10 studies and N = 30 (this is scenario 3 and 38).
We only look at the CI based on the normal distribution:

# Coverages are saved in next vector:
coverage.norm <- mean.coverages$norm
#coverage.norm
str(coverage.norm)
##  num [1:4096, 1:350, 1:45] 1 1 1 1 1 1 1 1 1 1 ...
# Let us look at one voxel:
K2_N30 <- 3
K10_N30 <- 38
coverage.norm[729,,K2_N30]
##   [1]  1  1  1  1  1  1  0 NA  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
##  [24]  0  1  1  0  1  1  1  1  1  1  1  0  0  1  1  1  1  1  0  1  1  1  1
##  [47]  1  1  1  1  1  0  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
##  [70]  1  1  1  1  1  1  1  0  1  1  1  1  1  1  1  1  1  1  1  1  1  0  1
##  [93]  1  1  1  1  0  1  1  0  1  1  0  0  0  1  0  1  1  1  1  1  1  1  1
## [116]  1  1  1  1  0  1  1  1  1  0  1  1  1  1  1  1  1  1  1  1  1  0  1
## [139]  1  1  0  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  0  1  1  0
## [162]  0  1  1  1  1  1  1  1  1  1  1  1  1  0  1  1  1  1  1  1  1  1  1
## [185]  1  1  1  1  1  1  1  1  1  1  1  0  1  1  1  0  1  1  1  1  1  1  1
## [208]  1  1  0  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  0  1
## [231]  1  1  0  1  1  1  1  1  1  1  1  1  1  1  1  1  0  1  0  1  1  1  1
## [254]  1  0  1  1  0  1  0  1  0  1  1  1  1  1  1  1  0  1  0  1  1  0  1
## [277]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  0
## [300]  1  1  0  1  1  1  1  1  0  1  1  1  1  1  0  1  1  1  1  1  1  1  1
## [323]  1  1  1  1  1  1  1  0  1  1  0  1  1  1  1  1  1  0  1  1  1  1  1
## [346]  1  1  0  1  1
coverage.norm[729,,K10_N30]
##   [1] 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 0 1 1 1 1 1 1
##  [36] 1 1 1 1 0 0 1 0 1 1 1 1 1 0 1 1 1 0 0 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1
##  [71] 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 0 0 0 1 1 1 1 1 1 1
## [106] 0 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 0
## [141] 1 0 1 1 0 1 0 1 1 0 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0
## [176] 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 0 1 1 1 1 1 1 1 0 0 1 1 1 1 1 0 1 1 0
## [211] 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1
## [246] 1 1 1 0 0 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 0 1 1 1
## [281] 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 0 1 0 1 1 1 1 0 1
## [316] 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1
# Average coverage
mean(coverage.norm[729,,K2_N30],na.rm=TRUE)
## [1] 0.8710602
mean(coverage.norm[729,,K10_N30],na.rm=TRUE)
## [1] 0.8142857

Scenario 8: 2 x 2 x 2 multivariate fMRI time series with simple design

To further pinpoint the weird observed coverages in previous scenario. We try to simplify some procedures. In this scenario, I only simulate a grid of 2 x 2 x2 voxels (8 in total). Furthermore, the fMRI analysis is now only one condition using a simple blocked design experiment (20 sec ON/OFF). I only consider 2, 5 or 10 studies while increasing the sample size from 10 to 100 in steps of 10. Results are based on 3000 simulations.
Hence the analysis of the time series within subjects simplify to:

TR <- 2
nscan <- 200
total <- TR*nscan
on1 <- seq(1,total,40)
onsets <- list(on1)
duration <- list(20)
effect.null <- list(0)
DIM <- c(2,2,2)

# design matrix for generating data
design.null <- simprepTemporal(regions = 1, onsets = onsets, durations = duration,
                       hrf = "double-gamma", TR = TR, totaltime = total,
                       effectsize = effect.null)

# Actual simulated data
sim.data <- simVOLfmri(design=design.null, image=regions, base=base, dim=DIM, SNR=0.5,
             type ="gaussian", noise= "mixture", spat="gaussRF", FWHM=2, weights=w, verbose = TRUE)
  # Transform it to correct dimension (Y = t x V)
  Y.data <- t(matrix(sim.data,ncol=nscan))

# Fitting GLM model.
model.lm <- lm(Y.data ~ x)
b1 <- coef(model.lm)['x',]
COPE[,s] <- b1

In which x only contains one predictor.
The second level GLM is identical to previous scenarios:

# Group COPE (average)
GCOPE <- apply(COPE,1,mean,na.rm=TRUE)
# Now we will do the OLS estimation of the variance
GVARCOPE <- apply(COPE,1,var,na.rm=TRUE)
# TMAP
GTMAP <- GCOPE/sqrt(GVARCOPE/(nsub))

We then continue by transforming the T-values to an ES, calculate a weighted average and construct CI’s around it.

The results of the emperical coverages are shown below:

Scenario 9: 16 x 16 x 16 multivariate fMRI time series with simple design

We repeat the same simulation, with the larger grid of 16 x 16 x 16 voxels (4096 in total) which is the grid being used in the previous scenarios. The other parameters are the same as in previous scenario. In fact, the only difference is:

DIM <- c(16,16,16)

The results are plotted below:

By having more voxels, which equals to having more values (as there is no correlation between voxels), we observe a similar pattern. This pattern could be more robust though.
Note the unexpected result of having lower studies in the weighted variance situation leading to emperical coverages closer to nominal level.